SHrinkage Covariance Estimation Incorporating Prior Biological Knowledge with Applications to High-Dimensional Data
نویسندگان
چکیده
In “-omic data” analysis, information on the structure of covariates are broadly available either from public databases describing gene regulation processes and functional groups such as the Kyoto encyclopedia of genes and genomes (KEGG), or from statistical analyses – for example in form of partial correlation estimators. The analysis of transcriptomic data might benefit from the incorporation of such prior knowledge. In this paper we focus on the integration of structured information into statistical analyses in which at least one major step involves the estimation of a (high-dimensional) covariance matrix. More precisely, we revisit the recently proposed “SHrinkage Incorporating Prior” (SHIP) covariance estimation method which takes into account the group structure of the covariates, and suggest to integrate the SHIP covariance estimator into various multivariate methods such as linear discriminant analysis (LDA), global analysis of covariance (GlobalANCOVA), and regularized generalized canonical correlation analysis (RGCCA). We demonstrate the use of the resulting new methods ∗Corresponding author. Email: [email protected].
منابع مشابه
Regularized Discriminant Analysis Incorporating Prior Knowledge on Gene Functional Groups
In the last decade, the renaissance of interest in discriminant analysis has been primarily motivated by possible applications to tumor classification using highdimensional microarray-based data. In this thesis, we do three things: 1. First, we introduce a new regularizing covariance estimation procedure we refer to as SHIP: SHrinking and Incorporating Prior knowledge. The resulting covariance ...
متن کاملIncorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data
MOTIVATION Discriminant analysis for high-dimensional and low-sample-sized data has become a hot research topic in bioinformatics, mainly motivated by its importance and challenge in applications to tumor classifications for high-dimensional microarray data. Two of the popular methods are the nearest shrunken centroids, also called predictive analysis of microarray (PAM), and shrunken centroids...
متن کاملNonparametric Stein-type shrinkage covariance matrix estimators in high-dimensional settings
Estimating a covariance matrix is an important task in applications where the number of variables is larger than the number of observations. In the literature, shrinkage approaches for estimating a high-dimensional covariance matrix are employed to circumvent the limitations of the sample covariance matrix. A new family of nonparametric Stein-type shrinkage covariance estimators is proposed who...
متن کاملPositive-Shrinkage and Pretest Estimation in Multiple Regression: A Monte Carlo Study with Applications
Consider a problem of predicting a response variable using a set of covariates in a linear regression model. If it is a priori known or suspected that a subset of the covariates do not significantly contribute to the overall fit of the model, a restricted model that excludes these covariates, may be sufficient. If, on the other hand, the subset provides useful information, shrinkage meth...
متن کاملPenalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables.
Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying clustering structures. Hence removing noise variables via variable selection is necessary. For simultaneous variable selection and parameter estimation, existing pena...
متن کامل